This notebook will explore the Ridge property data as modeled by FVS and the Ecotrust Growth-Yield-Batch system. Also serves as a demonstration of pandas and associated python libraries.
First, we import the necessary libraries
In [1]:
%matplotlib inline
from matplotlib.pylab import plt
import pandas as pd
from sqlalchemy import create_engine
from matplotlib import cm
import seaborn as sns
Create a connection ("engine") to the sqlite database produced by GYB and read the entire table into a pandas DataFrame.
In [2]:
engine = create_engine('sqlite:///data.db')
df = pd.read_sql_table('trees_fvsaggregate', engine)
Ipython is not the cleanest interface with which to browse large datasets. Luckily you can just take the top couple rows
In [3]:
df.head(6)
Out[3]:
or write it to excel with df.to_excel('file.txt'). Keep in mind the size of the dataset though, not recommended for the full dataset due to limitations in excel
In [4]:
df.shape # (rows, columns)
Out[4]:
Basic descriptive statistics
In [5]:
df.describe()
Out[5]:
The best feature of both pandas and excel: pivot tables. And note that we can write the resulting DataFrame out to an excel file for easier viewing.
In [6]:
import numpy as np
pt_year = pd.pivot_table(df,
index=['cond', 'rx', 'offset'],
columns=['year'],
values=['removed_merch_bdft'],
aggfunc=[np.sum],
margins=True)
pt_year.to_excel("harvest_by_year.xls")
pt_year.head()
Out[6]:
In testing in the Forest Planner, we noticed that yields were very low initially and that stands were starting the simulation at single-digit trees per acre (TPA). Let's confirm that we've resolved that issue.
First subset the query for the grow-only rx in 2013.
In [7]:
startdf = df.query("year == 2013 and rx == 1")
# same result with alternate syntax using .loc
startdf = df.loc[(df.year == 2013) & (df.rx == 1)]
Examining the distribution of starting TPA, we see reasonable TPAs
In [8]:
startdf.start_tpa.hist()
Out[8]:
In [9]:
conds = df.cond.unique()
conds.sort()
conds
Out[9]:
In [12]:
plt.rcParams['figure.figsize'] = (10.0, 8.0)
sns.tsplot(df.loc[(df.offset == 0)],
"year", unit="cond", condition="rx", value="after_merch_ft3")
Out[12]:
In [11]:
from pandas.tools.plotting import scatter_matrix
scatter_matrix(df[['after_qmd', 'after_tpa']],
alpha=0.2, figsize=(6, 6), diagonal='kde')
Out[11]: